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IDEBTIPIEBS. 
I6ST6ACT 

In.spae correlational studies it is not reasonal)le to 
assaae'that biTariate observations are' an correlated. In ezaiple voold 
be a confignral analysis in which two individuals' are correlattd > 
.across ^ev^ral variables (e.g., Q-teeftnigue) . The 'present study vas a 
Honte Carlo investigation of the robustness of techniques used in 
judging the aagnitude of a sasp'le. correlation coefficient vheo 
observations are' correlated. Eapi'rical distributions of r# t, and 
Fisher's z' were generated, patterns of * correlation vere found which 
\cansed error rates to be as high as 0,20 when the nosinal alpha was 
0.05. '1 technigue for controlling error rates in certain situations 
is suggested. (Author) i ^ 



ABSTRACT 



In soM correlational studlus It is not raasonabla to aatuac 
that bivarlata obtervatlons are uncprralatad* An axinpla vould ba 
a con^flgural «ialysia in xhlch two individuals ara cor^alatad 
across ^aver^l variables (e.g* Q-technique)* Th^ present' study 
vae a Monte Carlo Investigation of the robustness of techniques 
used In Judging the aagnitude of a saaple correlation coefficient 
when observations are correlated* 'Baqpirical distributions of t» 
t, and Fisher *s c were generated. Patterns of correlation were found 
vhfch caused error rates to be as high as #20 ^en the nominal alpha 

4 * 

vas .OS. A technique i^or controlling error rates in certain sltuatlona 
is suggested. 
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Introduction 

« Itv. order to use the* distribution of the sample correlation ' 
coefficient 9 r. In testing its statlstlcsl signlf Icsnce, one of the 
necessary assumptions is that the bivsrlate observations be lnde-> ^ 
pendent. Situations arise in vhlch this sssumptlon laay xtot be 
varranted. In taxonomic problenst a >Q*correIatlon 'is one index 
vhlch can be used to Judge ptof lie slfflilarity. Kere people rather 
than variables are correlated*, and the blvariate observations used 
^ in the calculation of r are not , in general, independent since each 
peifson /has a scote in each blvariate observation. ' Another oxanple^ 
arises in corvflatlng the observations of tvo Judges who have rated 
the same person or gtbup on a ntimber of dimensions. An investigator 
attempting to Judge' the magnitude of a correlation coefficient in 
such situations might be unwilling to refer r to -fisher's distributions 
for they are based on a different model. 

Purpose 

Consider the.n x 2 data matrix Y«[y . ] in vhlch the rows are 

- ij 

Indexed by l*l»...y n, and the columns are subscripted by J»l,2. If 
. the revs are randomly, drawn observation vector^ from a blv&rlate 
normal' distribution* then th^ distribution of the sample product 
moment correlation coefficient between the columns of Y* r^^t 
kn<^'. Fisher (1914) obtained the distributions for both' the p,>,"0 

and p_i^O cases. Morrison (1962) and HaoGregv^r (1962) studied the 

12 - ,y 

distribution of r.^ when the row? of Y are not Independent « Both 
authors made restrictive assumptions about the pattern of correlation 



jCOAfflclents aao^g obstrvationa^ or values of p.. 4t'4t« The 

tibjactiva of *tha praaaat atudy vma twofoldt 1> to eonatruet a 

cottputar prograa id^^ch could ganarata aasqpla values oi^ based 

on obaervatlona frofei populations in which .the population correlation 

atructure deacriblng a n x .2 patrix Y could b^ apac^fied 'by the user; 

7and 2) to obtain empirical dlitributions of r.^ t^t the robuatnesc^ 

^ of the techniques used to Judge the oagnitude of coulil be 'observed. 

Since the number of paramatera vhith must be specified iiii this t;ype /• 

of inVeatigatlon Is large (In the n«"10 cAset for examplet 190 para- . 

meters must be specified)^ Honte Carlo methods and limlte^ computing 

time prohibit the complete specification of a family. of diatrlbutions. 

t • ' ' • • • 

The purpose t therefore^ was to'determlncnlf . ernjr rates could be 

affected by di^pendenclaa among^^tho^-data and, to investigate varlablea 

which might relate (o any existing lack ot roburftnes^. T ^ 

. For convenience t this Investigation lit discussed using termlnol- 

ogy aasodlated with hypothesis tes(i^» such as ailevel and err6r Yat^* 

Hewe^r» there la a direct application of the fi&dlngs to areas* In' 

which hypotheais testing in the usual sense la not of Interest. For.. 

example^ in studies wher^^Q-^orrelations ate h^ed on two randomly ' 

drawn obsa^atlj^vectors» the InveatigAtp^kncws that Eir^)*Q* 

therefore^ he ia^not Intereated In teating Hq:p^2^*^ atlll^ 

howfvert may be liaTtereited la deten^lQlng how extreme an obtained 

correlation, la In the sampling distribution of r^^. 



* ' Methods { 



of Minnesota's CDC 660a. ^ The > : 

dns of^r«<,» Student's: t» and Fisher's z» 1' 

' I. • I' 

ent8» . gives plots » 6|ul tallies ^he extreste . 



The present Monte Carlo invest;^igation used a Fortran 'ptil^^lia^y 
vrltten for the University of Minnesota's CDC 660a. ^ The 

prograa provides distributions 

• * . /► 

ccmputes the firiit, four iBOsentSi 
^values of each distribution. Restrictions in- the present . program r 

• • • A: 

are a<20 knd s given population correlation aatrlx'^^fft be poaitiAv^ 
definite, • - * • ^ 

The. program user initislises the^ values of the' 2n x^2n matrix 

*■ • ' 

R«tp.. .fj'fit the underlying » population correlation matrix for the 
- lj»i J . \ • . > , 

elements of Y. Thfe n x n submatricas of Rj^^^ and ^^Z* specify 
the population cofrelatioiifs for tjb^ elements within the first .and. 



second columns of 



respectively. » R^2.^'d B2l**P^^^'^ 



population values f or j relationships between elements in the two, 

^oludms. The program simulates^ the situation In which n-Mlmehslonal 

, * • ) " • ' ^ *• • • 

vectors from a litultivarlate normajl population are selected .and tt^en 

correlated. * ' 

' ' , ■' I ' , . 

f This Investigation builds on a result obtained' by Morrison (1962) » 

/ ' ^ / 

which, using the present notation can be written as 

t ^12 ^ 



. t^<^-^ll,i'l>^^-^i2.1^2>j'^^ 



In this expr^8irion» ts a non-centrallty )>arameter fot Flsher:s 
distribution of r^^^^ if^i'» and p^^ j^,Yr^*^^** the iq^uatlon are 
constants as i or 1' .vary from 1 to n. In otherjvrords'^ the dlstributilon 



of based. on correlated obaervatlxrna la knovn If the of f -diagonal > 
elements of R^^^ R22» ^^.^12 equal to constants (possibly the same 
conatant) and the diagonal ^erms of R^^j ^<luaL a constant • 

In sdme psychological applications the assumption of . constant 

correlation required In Morrison's equation would be overly restrictive. 

f % 
Therefore » the strategy was to vary the off*dlagonals In the submatrlces 

of R and to observe the effect t>n error rates. 

\- . r ' ■ . ' ■ 

Data atid Program Verification ' 
The computer program uses a random number generator to create 
2n-dlme;i8lonal vectors of Independent standard normal deviates. The 
elements In these vectors are then randomly pei^uted to Insure 
adequate coverage of 2n-dlmenslonal space. Each. yec tor Is transformed 
by a triangular factorization o^. the matrix R to obtain a vector 
z'^N(0»R) (Scheuer & Stoller^ 1962) and then partitioned Into two 
n-dlmenslonal vectorls. I.e. the two columns of ^Y. 

Jor all of the distributions of r^^ generated: a) R was Initialized 
by specifying" the values of ?22* -12 example Table 5)i • 

b) p^^ 12**^12 n»io. A separate initializing program . 

was vrittesn which. accepts specifications f&r initializing R and then, 
if necessary, employs an iterative numerical procedure which reduces 
the level or* dispersion of the Correlations to obtain a positive- 

definite matrix. . • 

• . » 

\ ■ ' 

The program was tested Hxslng n«10 and generating correlations 
which produced known distributions of r^2* most stringent 

test» R was initialized so that Morrison's equation [1] would provide 
the exact non«*centrality parameter* p^2» (i*^* ^^.l* -22 ^2 ^^^^^^^^ 



ERIC 



to Morriffon/s requirementa, and th^ distribution of was- known). - 

The values of R were ^11^12"'^^^* ^il»i'2**^^^^ ^il»i'l*^i2»i'2"*'^®^* 
ilti*9*and the don-cedtrality parasieter^ was .4. Using Soper» 

Ycungt Cave'» Lee^ and Pearson^s <1916) tables with non-centrality 

/ * ' , 

paraneter ^^2^'^ n«10, the theoretical monents pf the distribution 

of r J were compared to the obtained eatpir^cal noments Csee Table 1). 



/. 



Results 

The study was divided into two jiegfllents. In th'e first segment . 
(Case f)» ?i3^*522 5i2"?2l''2»^**^^ secon^s,(Cas€ IT), 5ll"-22 
and Bi2"^21^-' Each experiment reported .vms based on 10,000 realiza- 
tions of rj^2» • 

Case I J ' v/^ «^ ' , 



Since Korrison's results showed that the distributions of 
remained unchanged for th^ fase assuming constant off-diagonal el^entW 
in Rj^j^'&nd R22 ^nd ^]^2*-21*2* first question Investigated was. 
whether or not heterogeneity in the off diagonal elements in R^^^ and 
*22 ^^^^^ cause the actual alpha level to be. substantially different 
from the nominal value. The correlations presented in Table 2 were 
used, in both R^^ and R22 in a Monte Carlo run designed ^o answer 
this Question. Table 3 presents empirical and theoretical moments 

ai\d gives the empirical alpha level obtained when the< critical rvalue/ 

■ « - ^ * • 

was t 0^*y2 ^^^^ eight degrees of freedom. It is apparent from the 

moments th^Ct while the distribution based on^ correlated observa- 
tions remains centered at zero and unskewed^ 'it Is more variable 
and platykurtic i^au^icg the obtained a to be more tha^ twice as /large 
as the nominal level. 

V 
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TABLE 1 



.PROGRAM TEST, COMPARISON OF THEQUETICAl ASD OBTAINED 



MOMENTS AND ERROR RATES FOR THE CASE 



* 


ERROR RATE 


MEAN' ' 


VARIANCE \ 


\ • 

t 




'OBTAINED 


.0500 


' .3795 .-^ 


4 

.0853 


'■ A337 


3J.307' 


rREORETlCAL 


.0500 

r 

K ■ \ 


J 3813 


.0851 


V 

A374 , 


3 a669 • 

i_ ■ 



:/ V 

t 
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CASE 1.; CORRBUTIONS OSED^ fltJN 1» WHICH R^ - R^j "HAD HIGHEST HETEROGEHEITT 



1.000 


.725 


.725 




. .725 


1 


■ .725 


. .7i5 


,275 
*r-^ 


:.275 


t 


1.000 


.725 

• 


.,725' 


1725 

■ . - y 


.J 

.725 


. .725 


.275 


.275 

• 






1.000 


.725,. 


".725 


.725 


- .725 


• .275 


: ^ 

.275 


1 ''V' 

.275 i 






1.009 


.725 


. .725 


• .725 


^275 

M 


.275. ; .275 


. .r !• 


• 

■■-( 


•1.600 


^.725 


.^25 




■ 'i • 
:275 " .275 


1 






1.000 


_, - ^ 
.725 




" . ' ■ 
.275 </^75 


. . ; j 


■ 








1.000 

• 


.275- i .273 •■.275 

• — '4 


• • 1 
. i 










* ^ 

■ 


1.006 ' .275 *.275 • 

■ < 


.( . 

^ ^ 1 






f 


■ i l.OdO .275 

1' 


i 
i 

— >. 1 I- — 








1 




t ■ 1.000 

• 1 



0 



TABLE 3 



CASE 1. COMPARISON OF MOMENTS AND ERROR SATES FOR THE P12- 0 THEOREJICAL. 
> SAMPLING DISTRIBUTION AND THE EMPIRICAL SAMPLING V' 
VDISTRIBOTIOM OBTAINED .VPEN Rj^j^ - HAD HIGHEST HETEROGENEITY. 



4 . - ^ 

^ •• - 


ERROR RATif- 

\ . \ 


/- - 

MEAN 

^ f 

r 


VARIANCE 




- -'^^ 


OBTAINED 


. !il3D- . 

\ \ ' 






^- .0002 


2.2246 


THEORETICAL 
— \ 


.Of'sDO 


; .0000 


, .1111 

^ ^ , i'- 


.0000 


2.4545 



/■ • 



> ■> 



j ■ 
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Thet heterogenei^ of the correlations In Tabl^ 2 appretfches |t 
aaxlaun given the ccjnstraint Qentioiied/ln« ab9/e\, 

•5ll"^2' ^^'^^ set wich highly herarigeneous values^anji iterated 

^.by. the Initializing program. UQ^ll.^he-ppsltlve.^ 
vaa'net*' Because it^s believed chat these extifem^y varlj|^t- ' ^ 

correlations' were unrealistic for m^tiy types of psychologipai datSs 



It iras decided—to. investigate matrices with less ^variance' among' 



Xhe Aj^j i^'j'** ^'^^ levels of variance v^c^ chfi^sen,^Oj^-.904,an(|c 

\ 2 * \ ■ • * ' • ' : 

o^-,016. where o is based on the.offndlagonals of *Mean' 

levels of Correlation were S^udVed within, each level of /variance^ A 

to* determine If giveii a piirticular lev^l- of heterogeneity amoAg^ ^ * 

tbis correlations^ the average level of correlation wojxld hav<s an 

• • • ' / • . ' - 

effect; (yj^».285 and Mg-.^69); Table 4 prescfhts the obtained mpmeh^ ^ 

for these four runs and 'the "actual a levels obtained when a critical - 

value of t ^i^^ eight' degrefes of freedom was used • , , - / 

. . , 

It Is clear from these resulttf that variance plays the prer 

domindnt role In. affecting the error rates and that once. a degree i ' 

of .heterogeneity Is established > the mean level of correlation also 

has somi^ effect. Table 5 contains the correlations used 'to initialize 
*.'«'* * . . ' 

•^l"522 low mean and low variance so that the reader might contrast^ 

the cqprelation matrices used for the highe^ and the lowest obtained 

f error rates in Case I. \ ^^^^ ^ 



Case II- ' 



As with Case I^ the computer runs In Case II were designed to 



deviate from Morrison's findings In that the correlation* coefficients^!^ 
Bll"?22 off-diagonals Bi2"-21 not" equal 'to 

* ■• ;.. ., V • ■ 



• TABLE 4 

• ■ ^ , 

CA^ 1. GpMPARKON OF ERROR RATE§ AND MOMENTS 
FOR THE^^illPIRICAL SAMPLING DISTRIBOTIORS OF OBTAINED 
* " WHEN LEVEL, y, AND HETEROGENEITY; of WERE VARlim) 









\ . — 

i ■ 

» i • 

..') 


> 

^ ERROR 
RATES 


[- ■ ■ - 1' 

' MOMDITS . ' . , 


MEAN 

>• 

\ 


VARIANCE 


«1 ^ * ' 




2 

^'l^'l.- 


.0517,; 


\ 
• 

; .0036 • 


.1128,^ 


'2*^00 

• • 


.2.4289 


2 

o U 
L H 
*. 


" .0536 


.0013 

• 


* 




.0003 


2.4^5' 


H L 


.0567 . 


«0016 


.1168 


' .0002 
<.0000 

« 


2.4314 
2.3531 

1 


2 

o u • 
H H 


.0695 


-4>041 , 


.1257 



— \ ^ ' 



■ \ 



TABLE 5 



CASE 1. COSXZLATIOSS IH - ^ ^ WHICH. 
TIEZbs THE LOWEST ESKOE KATE IN. CASE 1. 



1.000 



.371 



.371 



.371 • 



.371 



.255 



,255 ! .255 ^255 i .255 



.155 .255 ■ .255 



1.006 



\3>1 



r37i 



.371 



.255 



.^5 



1. 000 



.371 



.371 ) 



.31^ 



.313 ! '.313 .255 .255 



1.000 



.371 .313 



.313 



f .313 .313 .313 



1.000 



.313 



,313 I .313 i313 .313 



i.odo 



.197 ; .197 .197 .19*7' 



1.000 ' .197 .197 .197 



1.000 .197 



1.000 .197 



1.000 



constants.. Gi>^en Morrison's findings And the results froa Case 
tvD h^theses were ettployed la designing coBputer runr for^Case II: 
I) If mn analog to' Morrison's nonrcentrallty perameter^ ssy p^^^^ 
Is coajputed by replacing the constants which* he specifies with the 
averages of the off^la^ ^l elpiients. in 5ix7^2 ^ ~12*-21^ * 
the discrepancy between diagonal of Rj^2''^21* ' 

i-|p{2-<^tX^i2l"'^12"'^12^'' •^^^^^ celate.to variability aaong the 
€rrdr rates l(thls hypothesis was aotlvated by Inspection of ' , 

) Moig;lson''s equation. which shows thiii^ when ^^^2^0 the actual location 
«of the distribution Is changed when dependencies across blyarlate^ 1 
observstlons exist.); 2) giv^jti a value* 6^ varlaJ^lllty of the 

; /• . ^ • f . • • 

correiati/ons ground the average Values substituted Inta Horrlson's 

equation should also ^laln vj^riablllty aabng.the error rates. 

* < • 

-|The conditions studied werie 6^».006t 6^.0^5^ 6^«.I58 and 

6.«.262. Nested within each level of 6 were two levels of variance^ 

.4 > • J J t 

0^ and o^L Mattrlces for the case were created by holding^ the ^ 
mean' levels .of the off/diagonals of ^]^''E22 &j^2"'52l* » 
so that p^2^^^^ tlnaff acted, and then halving the standard 
devlatlbn of these off diagonals (aee Table 

The error rates obi;Alned fro«i these eight runs varle^ f roa 

- • . ' , / 

•0$7 to '.2011. 'They were .obtained by cosiputlng the propojrtlon ot 

'^correlatlcma which exp*eeded a critical valW, based on FlsW^s 

J : - 

a-^tranaforaatlont .and therefore sl»ilate the situation In which m 
researcher teste tlie^gniflcance of an Orbtalned correllaMon at 
a tiOttlnar a««i)5 using Fisher's nodel Cl«e. vhere R^j,«R22*I 



- . . / TABLE 6 ^" 

CASE 2. P«V MEANS, AND VARIANCES FOR THE 'SaTRICES OP CASE 2'. 

• ■ ^ " '\.- 



V/ 



i 

• • 


0-".006 


O2-.035 , 


6,*.15o 


1— 






.5693 


.2843 


;285A. •• 




! 

hi 


.4528 


f .1678 

(l . * . *• 


.2292- .. 


.5112 \, 


2 




.(Jl)39 . 


\ .0039^ • 


.0010 


.0016 




.0039 


. ..0039 


.0010 ^ 


.0610 






.0157, 


■ .0157 


.0039 . 


.0039 




.0156 


...0156 
« 


.0039] 


" .0039 


?12 " ^11.12 


.800/ 


.500 


.400 


* .700 



R^2*^X diagonal^ with nonzero elements eqtuil tOt^Q^^^}. * Critical 

valttes Ifor the >^-8tatlstlc i/ere based, on noaents given by Kendair ^ 

and Stuart (1963) rather than the approxlnat lofts typically found 

in' applied statistics texts. The^ results In Ta]ble 7\learly show 

that a ^^ea^cher' inay be working at an a level nuch higher than 

thV announced value. Th^i^ dl^repancy^ 6^\q^2^i2^9' '^^^^^^^^ 

most of the dlsbrepancy aftong the ei^fpj rates. Variance .has* little 

effect relative Co this discrepancy/ In fact If (jj^2 had l>een^ 

used In obtaining cutoff points leather than On ^9 the rejection 

rates for'the four. Case tl matrices, producing the highest ertjor • 

rates* •201» :i98/ .OSA^/and .083^ would have been .057^ .•053* .OSS* 

and • 052 respectively. ' ^ • ^ ^ 

Conclusions 
■ * • • * 

en a researcher has correlation coefficients ^^ed on d^ta * ^ 




^* wlilch a^degree of corr^ation exists not only^ betwe/n the columns 
of the data matrix but^also among tii^rows and he wishes to judge 
the magnitude' of such correlations* he should notf assume that 
values based on the t-dls(rlbutlon (If the hypothesis ls«p^2*0^ 
or on Fisher's z-tranaform^lon (li^.the hypothesis Is p.^^some. 
constant) will give hliil a test ^f size a; There are situations 

In which 'the errpr rate can be four times that of the npmlnal a 

• ■ * * 

and* of ^ourse* there may be situations other than tI>os^ Investigated 

In whleh'the' situation is ac^Wally worse. . 



What has been learned 4s that variability among the corre** 

latlons in the of f*^lagonals bf ^2*^2* f^^^^ '^2'*^2l"^* 
affect t« 'error ^ratea*anB* glven^ that variability exists* the 



I / 



CASE 2.. EKHOR RATES AKD H0KS8TS FOR T^B EK^IRICAL 
. SAMPLIHG DISTRIBDTIONS OF r^^j* . 





ERROR 


1 >r — ■ ■ 

• . ' ) ' - X ' 

^ MOMEUTS , . , ^ 


{ • 
MEAN ' 


varIakce ' 










2 


.0527 , 


.7851 


' .0210 


3.1988 


8.3180 


2 


.0619 


.7789 


.0237 


3.0181 ' 


7.4546 




2 • 


.0540 


.4421 


— H ■ 

.0777 


.6267 


3.5143 


2 

"h. 


• 

.0600 


.4365'* 


.0812 


4 

.5^914 


f » 

3v3506^ : 




. '2 

"l 


.0833 


.2280 


» 

'.1037 


-1570 , 


2.6611 


2 


.0^40 


.2222 


. .1032 


— ; — % , ' 
.1360 


2.66!i6 r 
\ 


> 


2 . 


.1977. 


.4193 


.0802 


^168 


• 3 .2785 


2 


» ■■ — 
.2011 


.4174 


.0834 


r . 

.3305 


3.2746 ' 



\ 
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level of cbrrelmtloa alto plaw a nau roleJ^ In thi, non-null catfe 
R^^'^K^./Oy^lt vms found that hataroganaoua correlations In the off-^ 

diagonala of ^^^^22 -12"^21 ^'^^^ ^^^^^^ ratee. Bovever, , 
there la utility *ln ualn||^ cutoff points^ based on. ^^ther than * 
^j^2« ^'^f ^^^^o^'^ pc^ts resulted in error rates' close to a. 
* Of course 9 the resultfs of thla study should only be applied to 
situations in which have* correlations aliftllat to those 

Investigated here. It Is therefore recoaninded 'that' re$earchers« 
v^' usi the Monte Carll^ method to study ther^f feet of the correlation 
* patterns which ^xnderlle' the date they tend to Investigate.* A copy 
ot the progVans used In the present Ipvestlgatlon laay be. obtained 
fro^^he^apthors. ( i - 
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